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STOCHASTIC ANALYSIS OF MULTIPLE-PASSBAND SPECTRAL 
CLASSIFICATIONS SYSTEMS AFFECTED BY OBSERVATION ERRORS 

ABSTRACT 

The problem of classifying targets viewed by a "push’*' 
broom" -type multiple-band spectral scanner by means of 
algorithms suitable for implementation in high-speed online 
digital ciuuits is considered. A class of algorithms 
suitable for use with a pipelined classifier is investigated 
through simulations based on observed data from agricultural 
targets. The time distribution of target types is shown to 

\ 
! 

be an important determining factor in classification j 

1 

efficiency. I 
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STOCHASTIC ANALYSIS OP MULTIPLE-PASSBAND SPECTRAL 
CLASSIFICATIONS SYSTEMS AFFECTED BY OBSERVATION ERRORS 


I. INTRODUCTION 

In considering the problem of rapid, efficient classi- 
fication of images by an online processor in an earth-resources 
satellite, the complexity and speed of the processor represent 
extremely important constraints on the classification 
algorithms which can be used. In the present study, we 
consider classification algorithms which may be used with 
digital devices organized into a pipeline processor designed 
to operate synchronously with a "pushbroom"-type spectral 
scanner. This particular study is based on analyses and 
simulations of a three-stage classifier designed to discrim- 
inate among the crops represented in the data contained in 
LARS tape, that was obtained from Purdue University, but the 
methodology is easily applied for any number of stages. 

The architecture of the system is described in schematic 
form in Figure 1. At time t, the spectral scanner outputs 
the output from the target which is scanned by filter 1 at 
time t, the output from filter 2 at time t (which represents 
the signal from the same target which filter 1 scanned at 
time t-1) ; and the output from filter 3 at time t (correr 
sponding to the target scanned by filter 1 at time t-2) . 

These signals are represented by x-^ (t) , X2(t-1), and x^Ct^). 
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, The indices follow tho convention, which we adopt throughout 

|i this investigation, that t represents the time at which 

| the target being analyzed crossed filter 1, At each stage, 

K a processor uses the signal from the corresponding filter 

i 

i and information stored in a pattern library in system 

memory to produce a vector c^,i « 1, 2, 3, of information 

for classification. In the systems which we consider this 

* information will always be a Bayesian estimator of the 

I 

j vector of a posteriori probabilities for each type of 

1 source. The three algorithms considered in the present 

i 

i study can be implemented by processor which can complete 

| the updating of C in a time on the order of a few hundred 

I microseconds, thus permitting synchronous operation. 

! In Section II we discuss the actual crop data that 

, was used in the present study. A description of the 

stochastic model for which the classification algorithms 
are based on is given in Section III. In Section IV, we 
give a detail description of the Bayesian classification 
algorithms that includes the results of the two different 
type of classifiers that were employed ia the present study. 
Summary and recommendation of the present investigation are 
presented in Section V. 

A listing of the software that were developed for the 
present investigation is given in Appendix A of this report. 
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II. 


CALIBRATION D.'TA 


The actual data on which the statistics in this 
report are based are the spectral scans of crops contained 
in LARS tape, that was obtained from Purdue University. 

Of the spectra on this tape, 2434 are optical spectra of 
crops for which 60 or more observations are available. 

We considered 12 possible filters, each of which has a 
rectangular passband spanning six consecutive wavelengths 
on the tape, and selected those three filters which provided 
the highest entropy for the joint distribution of x^, Xg, 

and x 3 , thus maximizing the total information reaching the 
classifier. Since the correlation between adjacent wavelengths 
is very high, it is not surprising that the filters chosen 
were widely spaced. The three filters chosen, in order by 
decreasing conditional entropy of the output distribution, 
were: 


Filter 1: Wavelengths 4 - 9 

Filter 2: Wavelengths 24 - 29 

Filter 3: Wavelengths 59 - 64 

The crops considered, and the number of observations 
for each, are given in Table 1, which also contains X (chi- 
squared) statistics for the test of normality described below. 

In order to determine whether linear discriminant 
analysis should be considered as a classification algorithm, 
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we used a statistical test of multivariate normality for 
the three-dimensional vectors of filter outputs. Let £ 
be the sample mean of x, vector filter outputs, for a given 

A 

source class, and let S be the sample covariance matrix. 

Then the random variable 

2 

should have a X— distribution with 3 degrees of freedom, 
and 

P 3) 

X 

should be uniformly distributed. In order to test the 
uniformity of the distribution of this latter statistic, 
we divided the unit interval into ten egual subintervals 
and used the conventional X— test (with 9 degrees of freedom) 
for the resulting one-way contingency table. It is seen 
from Table 1 that neither x nor log (x) passed this test. 

We thus conclude that linear discriminant analysis is 
inappropriate and we must consider only nonparametric 
classification algorithms based on the empirical sample 
frequency tables. 


-i — * . 




























XIX. 


STOCHASTIC MODEL 


The classification algorithms which we consider in 
the present investigation are based on a Markovian process 
type of schemes in which the probability of source s(t) 
at time t is governed by the recurrence relation 


£ (t) = (1-a) £(t-l) -f air. 


Here £ is the vector of probabilities for each source type, 
cte [0 , 1] is the one-step transition probability, and £ is 
the vector of unconditional probabilities. In the absence 
of information concerning the relative abundances of the 
various crops we assume that 


w T_ ,11111, 


for the five sources used in these simulations, we test 
each of the classification algorithms on three 10,000-point 
simulations. The valves used in generating the pseudorandom 
samples are a = .2, ct = .5, and a ~ .8, representing high , 
intermediate , and low persistence , respectively. 

The information used in classification is initially 
decoded by comparison with threshold values as follows: 


^(t) = i 

iff 

T il 

< x x < e u 

= 3 

iff 

T j 2 

U x ) < Ij < 9 j2 ( + l) 

(t) = k 

iff 

T k3 

i x 3 i e k3 (t l' + 2> 
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defined for 


1 i ^ 5( 1 ^ j ^ 5; X k < 5. 


Note that 


T 11 “ t 12 


13 


and 


°11 “ °12 = 0 13 " "* 

The thresholds are chosen such that £ is uniquely defined 
and the entropy of the distribution of £ is maximized. 

Note that the thresholds used for discretization at each 
step depend on the results of the previous step. This 
procedure considerably increases the information contained 
in since the three filter outputs x are not stochastically 
independent. 

Furthermore, we assume that the probability distribution 
of ± is defined by 

£(t) = t 5 P£ (t) f A (£) . 


This assumption amounts to ignoring serial autocorrelation 
of the £'s whenever s(t) ^ s(t-l). We do not ignore the 
dependence between £(t) and £(t-l) due to the Markovian 
dependence of £(t) on £(t-l) . This assumption may under- 
state the extent of serial autocorrelation in the signal, 
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that is, overestimate the source entropy, but it seems tne 
most reasonable assumption to use in the absence of suffi- 
ciently detailed observational data. The relevant prob- 
abilities could be estimated from observational runs in 
which groups, of consecutive observations have not been 
averaged during preliminary processing. 

The actual generation of sample points for simulations 
is performed according to the following algorithm based on 
the stochastic process described above: 

1. Choose s(l) at random from distribution r ♦ 

2. For t from 2 to 10,000 perform steps 3 through 5. 

3. Generate us(Q,l) with uniform distribution . 

4. If u < a, choose s(t) at random from distribution £; 
' otherwise set s(t) = s(t-l). 

5. Choose x(t) at random from the actual observation s 
for source type s(t) in the data tape . 

The pseudorandom runs of 10,000 points generated for each 
of the three values of a by this algorithm are used in all 
of the tests of classification algorithms described below. 


Mkdii jfvaci. tilt" ... - f| ntjrtfr ' nlMgiiinff - J 
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IV. CLASSIFICATION ALGORITHMS 

As was pointed out in the discussion of the observa- 
tional data, the filter outputs x fit both normal and 
lognormal distributions so poorly that linear discriminant 
analysis is unsuitable for the problem at hand. For this 
reason, we have concentrated on Bayesian discrimination as 
a classification technique. 

A Bayesian classification could be based on either 
nonparametric density estimates for each source class or 
on contingency tables from discretized data. The large 
amount of computations required for nonparametric density 
estimators argue against applying them for on-line image- 
analysis device. The high storage requirements and search 
times required for nearest-neighbor classification similarly 
appear to preclude the use of this technique in the present 
application. 

Bayesian discrimination based on discretized scanner 
outputs requires only relatively modest amounts of memory 
and is well adapted for a pipeline architecture consistent 
with very rapid operation. A n-stage classifier for M 
categories based on a K-level discretization of each filter 
output requires only 

,n-l 


real values in memory. The basic mathematical operations 
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used are elementwise multiplication and addition of 
M-dimensional vectors, a circumstance whirh also favors 
implementation by parallel-processor systems. 

We now define the Bayesian update operation as a vector- 
valued function of two vectors by 


B i (P-'E) 


Pi^ 

M 
S 


i=l 


P^i 


Now let f(i;j,k,l) denote the probability density of the 
event £ « (j,k,l) given that the source is i. We define 
three sets of likelihood vectors by 


A li 


(j) = 


Z 2 f ( i ; j , k , 1 ) 
k / 1 

ir (i) 


Note that, 


SEE f(i;j,k,l) = 7T (i) 
j k 1 


with 


x 2i (j,k) 


Z f(i;j,k,l) 
1 


it (i) 


and 


X 3i (j ,k,l) 


f (i; j ,k, 1) 


ir (i) X xi (j) X 2± (j ,k) 
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v 

f ‘ The vector of posterior probabilities for the categories 

k based on a single observation <|> = Cjfk,l) could then be 

1 implemented as follows: 

' £l = 

r C 2 = B(C 1 ,X 2 (j,k) ) 

= B C£ 2 , (j ,k, 1) } < 

This three-stage calculation is consistent with the proposed 

' architecture of the classifier. The use of a three-step 

I 

[ calculation offers no particular advantage for the single- 

1 observation classifier just described, but does enhance 

I the speed of classifiers based on the more sophisticated 

\ 

I algorithms described below. 

i To take advantage of the Markovian processes type of 

I character of the assumed stochastic process, we may add a 

I fourth computational stage 

C^t) = B(C 3 (t) , (l-ct*)C 4 (t-1) + 

where a* is the estimated transition probability for the 
Markov process. Such a postprocessor increases the lag 
between observation and classification by one cycle time 
but does not affect synchronous operation if implemented 
- in a pipelined system. The results of using classifiers 

based on a* = 1,.8,.5, and .2 on 10,000-point simulations 
with actual transition probabilities a = .8, .5, and .2 are 
given in Table 2. 
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As one might expect, the classification is most 
efficient when ct = ct*. The algorithm does, however, appear 
quite robust in that underestimating the transition 
probability does not severely degrade performance. We 
remark that a* = 1 corresponds to a memoryless classifier 
in which the fourth Bayesian update step is omitted. The 
use of the Markovian property improves on the results 
obtained with a* = 1 in all cases except a - .8, a* = .2, 
for which the transition probability is grossly under- 
estimated. Even in this case, the degradation of performance 
is small. 

A slightly more refined classification algorithm makes 
use of the fact that a small amount of forward information 
is always available. By time t + 2, when the image scanned 
by filter 1 at time t is ready to be classified, the data 
(t+1) , (t+2) , and ^(t+l) are also available. By 

increasing the memory requirement, we can use all of this 
information in classifying the source as follows: 

Define 


X* (x; j 2 r k^ ) 

_ E f(i,j 1 ,k 1 ,l) J" (1— ct ) E E f(i;j 2 ,k 2 ,l) 
1 L k 2 1 2 

+ a E E E f (i 2 ; j 2? k 2 ,l 2 ) 1 , 
i 2 k 2 1 2 


t?5v 




■? - 








A** (i;j X f j 2 / 33 fk l' k 2' 1 l ) = 


E f ( i ? j / k-^ f 1*^) 


(1-a) 2 


E E 
1 2 k 3 


E f (i f D 2 f k 2 ^ *^*2 ^ ^ 3 r k 3 , ^"3 ^ 

1 3 


+2 aCl-o) 


E E E E f (i 2 ; j 2 rk 2 ,l 2 )f (i 2 ? j 3/ k 3 ,l 3 ) 
l 2 1 " 1 


+a 


2 



E 

1 3 


£ E 

1 2 k 3 


E f (i 2 ? D 2 f k 2 , ^"2^ ^3 ' ^ 3 f k 3 f ^"3 ^ 

1 3 


The classification algorithm can then be specified as 
follows: 


C-^t) = BU/^ (t) ) ) , 

C 2 tt) = BC^Ct) ,X,** C+ 1 (t-2) r <^ 1 (t-1) /^(t) , 
C 3 tt) = B(C 2 (t) / X.**(* 1 (t-2) ^(t-l) ,+ 1 (t) f 

(j> 2 (t-1) , <j> 2 Ct) r <|> 3 (t) ) , 


and 


c 4 Ct) = B (C 3 Ct) , Cl-fl) £ 4 (t-D+au) . 

Results from this classifier are given in Table 3. While 
the improvement over the results of the simpler classifier 
evaluated in Table 2 are not large. The greater complexity 
may be justified by the small gain achieved in critical 
applications . 
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A graphical presentation of the probabilities of mis- 
classification in the 10,000 point simulation test for type 1 
and type 2 classifiers are given by Figure 2 and 3, for 
the actual transition probabilities, a, and the assumed 
ones, a*. 






TABLE 2 


Number of Misclassif ications 
In 

10,000 Point Simulation Test 


Type 1 Classifier 


a*\ 

a 0.2 

0.5 

0.8 

0.2 

2687 

4456 

5470 

0.5 

3086 

4129 

4858 

0.8 

4070 

4346 

4664 

1.0 

4775 

4732 

4822 

a* = assumed 

transition probability 


a = true transition probability 
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TABLE 3 


Number of Misclassifications 
In 

10,000 Point Simulation Test 


Type 2 Classifier 



a* = assumed transition probability 
a = true transition probability 


imitate ;**£*; 





FIGVKE. 







V. 


SUMMARY 


The results presented, based on a classifier and a 
Markovian target-type transition process, show that the 
reduction of source entropy due to a tendency for adjacent 
targets to be of similar type can be effectively exploited 
as a source of information to improve the efficiency of a 
multistage image classifier. It is to be expected that an 
effect this large will generalize to other Bayesian classi- 
fication algorithms and other transition processes. 

The most serious limitation on the efficiency of the 
classifiers arises from the necessity of using relatively 
coarse contingency tables to estimate the posterior proba- 
bilities of the source types. This defect could be over- 
come either by using a larger corpus of observations to 
refine the empirical frequencies or by developing an 
analytical model for the conditional distribution of the 
filter outputs from each type of target. The very poor 
goodness-of-f it results given in Table 1 indicate that 
this will probably be a difficult task. 
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VI. RECOMMENDATIONS 

As a consequence of the demonstrated potential for 
exploiting temporal coherence of the sequence of target 
types scanned in order to reduce classification error, 
the investigation of the underlying stochastic process ‘ 
should be considered as a research objective. Studies of 
the spatial coherence properties of the mix of target types 
on a scale from several hundred meters to several kilo- 
meters would be useful for this purpose. 

The characterization of the probability density function 
of the spectrum of each source type is also a necessity for 
achieving efficient discrimination in practical applications. 
While the standard multivariate normal and lognormal 
densities provide a poor fit, it is likely that some effec- 
tive approximation in terms of a superposition of simple 
density functions can be achieved when enough data on a 
given source type becomes available. The application of 
cluster analysis to large data samples would be useful in 
thi;',} connection. 
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Data extraction from LARS tape using BSAM I/O. 
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-SAS Program for Calculation of Moments. 
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SAS Program to Calculate Source Entropy. 
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SAS Program to Discretize Filter Outputs, 
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SAS Program for Tests of Normality and Lognormality . 
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